wo 02/01312 



pTfi/Pfrr ^l--- 25 FEB 2002 

1 Qp/rM/StK^ 1 5 



METHOD AND SYSTEM OF INTELLIGENT 
INFORMATION PROCESSING IN A NETWORK 



FIELD OF INVENTION: 

5 The present invention relates to a metiiod and system of intelligent infomiation 
processing in a wide area network, sucli as Internet, tlirougli native language, 
such as Chinese. More particularly, it relates to a method and system of 
Chinese intelligent search in the Internet. 

O 10 BACKGROUND OF THE INVENTION 

m A Network is a distributed communicating system of compute)^ that are 

^1 interconnected by various elec^onic communication links and computer 

1^^; software protocols. A WAN (wrde area network) is a geographicaily dispersed 

m 

J telecommunications network and the term distinguishes a broader 

C3 15 telecommunication structure from a local area network (LAN). A wide area 

rij ■ 

network may be privately owned or rented, but the term usually connotes the 
inclusion of public (shared user) networi^s. A particulariy well- known WAN is 
W the international Infomiation Infrastructure, commonly called the Internet. The 

Internet Is a worldwide network whose Electronic Resources include (but are 
20 not limited to) text files, graphic files in various fomnats, World Wide Web 
"pages" in HTML (Hyper Text Mark-Up Language) format or various extensions, 
including XML, files in various and arbitrary binary formats, and electronic mail 
addresses. As in many other networks, the scheme for denotation of an 
Electronic Resource on the Internet is an "electronic address" which uniquely 
25 identifies its location within the network and within the computer in whidi it 
reskies. 



CI 



On the Internet, for example, such an electronic address is called a Universal 
Resource Locator or URL, and consists of a specially fonnatted concatenation 
30 of infomnation about the type of protocol needed to access the resource, a 
Network Domain identifier, identification of the particular computer on which the 
Electronic Resource is located, a port number, directory pafli infomiation within 
the computer's file structure, and the file name of the resource. Intemet URLs 
and similar denotation schemes for Electronic Resources are cumbersome for 
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human users. URLs are often more than 50 characters long and contain 
information that is neither Interesting nor meaningful to seekers of infomiatlon. 
Thus, some works have been done to make the search of web addresses 
under URL more meaningful to the information seekers or searchers. That is 
5 the seekers or searchers do not have to remember the exact URLs in the 
search engines, but some naturally used words or temns. 

U. S. Patent No. 5,764,906 describes a system for providing and maintaining 
short aliases for information resources and their providers and a system for 
10 translation of these aliases to meaningful electronic addresses, such as URL's, 
01 facsimile and voice telephone numbers and electronic mail addresses, and for 

7 accessing the resources by means of these addresses. Similarly, PCT 

f ;J appltoation WO 99/39275, published on August 5, 1999 describes a method of 

navigating the Internet to a resource based upon a natural language name, to 
^ 15 a resource that is stored In a network and kJentified by a location identifier 
fjJ Certain software products have become commercially available to assist the 

access of Internet resources using natural language names. 



3. 
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At present, many of such services are available. For Instance, RealNames 
20 (Central Co. http://www.realnames.com) substitutes short "keywords" for 
complicated Intemet addresses, or URLs, and has already offered its service 
through Microsoft's Intemet Explorer Web browser and MSN Web portal. 
Microsoft also announced the Inclusion of RealNames in its Web browser 
software. RealNames' service is an Internet equivalent to America Online's 
25 popular keyword system, part of its proprietary online service. The system 
allows AOL members to type a common phrase to find specific content 
channels. Similarly, Netword Agent software (http://www.netword.com) also 
allows a user to enter Intemet keyword instead of a URL. In addition, Intemet 
Engineering Task For<» (IETF) is developing an Intemet keywords standard. 
30 The IETF already has fomied a working group devoted to devising a "common 
name resolution protocol," or a standard way of Implementing Web keywords. 



However, the Internet keyword software products, such as those from 
RealNames or Netword, are either incorporated to a browser or as a plug-in for 
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. the browser. Generally, when a new version of the browser is released, the 
plug-in software must also be updated. 

Furthermore, the Internet keyword software products or keyword searches are 
5 either not suitable or cumbersome for processing certain native language, such 
as Asian languages, particularly Chinese, Japanese and Korean, or any other 
pictographic languages. Each character may not haye an exact meaning, and 
may have various meanings when being combined with one or more other 
characters. Therefore, normal keyword search techniques cannot be used to 
f| 10 obtain quiddy and accurately desired search results of such electronic 
addresses. 



riJ 15 

ru 



CI processing in the internet using native languages, such as Chinese. 

ry 

It is a further object of the present invention to provide a method and system of 
20 Chinese intelligent search in the Internet, either based on the characters or 
based on "pinyln" that is the pronunciation of the characters. 

It is still a further object of the present invention to provide a ^^^^ 
system of Chinese intelligent search in the Internet, automatically obtaining 
25 correct results even if the pinyin is entered with southern accent. 

SUAAMARY OF THE INVENTION 

In accordance with the present invention, a method and system of intelligent 
search in the Internet comprises identifying whether the input is one of a URL 
30 address, native language characters, and native language pronunciation 
notations. If the input Is a regular URL, the text input is queried in a domain 
name server and the query result is sent back to the browser. If the input 
includes characters of a native language, the input is processed as a natural 
language input. The search inquiry will be sent to the search engine, either 

3 



It is then an object of the present invention to provide a method of processing 
search inquiries in native languages, such as Chinese. 

It is another object of the present invention to provide a syst^ of information 
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remote or local, that performs an intelligent search based on the native 
language characters. The search result will be sent back to the browser, 
indicating the desired URL or wei>address. 

if the input Is determined as the native language pronunciation notations, i.e., 
phonetic spellings, it will be further determined whether the input is a full 
pronunciation notation (phonetic spelling) or abbreviations of first letters of the 
pronunciation notation. If the input is a full pronunciation notation query, the 
query will be processed in the pronunciation notation search table to obtain the 
desired URL or web-address, and the result will be sent back to the browser for 
selection. OthenA^ise the input will be processed in the search table of 
abbre>nations of first letters of pronunciation notations of the native language. 
The query result of the URL or web-address will be sent to the browser for 
selection. 

In accordance with the present invention, the intelligent search will comprise 
the determination whether a query matches precisely a website or webaddress 
or webpage. if it does not have a precisely matching website or webpage, a 
list of possible search results Is provided to the user for selection. 

Chinese character input is difficult for many users. However, if the computer of 
the browser is equipped with the Chinese input software, the Chinese 
characters may be entered as a search inquiry. This will Initiate the intelligent 
search of Chinese characters. To provide users with more options, in certain 
embodiments of the present invention, the system and method of intelligent 
information processing may accept "Pinyin" i.e., pronunciation notations or 
"Pinyin" headers, I.e., pronunciation alphabet abbreviations of desired query 
term so as to get a list of possible search results. 

The system and method may also process telephone number input and get to 
a relevant website conesponding to the registered telephone number, if a 
person's name (erttier In Chinese or English) is entered, tiie person's web-card 
may be retrieved from a remote webcard server, such as tiie one provided by 
http://www.letscard.com, or any other similar servers. These aspects of the 
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invention are closed in ottier conre^onding patent applications of the same 
applicant 

BRIEF DESCRIPTION OF THE DRAWINGS 

5 The accompanying drawings illustrate the embodiments of tlie present 
invention and the present invention can be beU:er underwood through them 
following detailed description in connection vrith the accompanying drawings. 

1=4 Figure 1 illustrates an example of a networked computer system that may be 

%i 10 utilized to execute the software of an embodiment of the invention. 

01 Figure 2 shows one embodiment of the invention. 

jl Figure 3 shows a process of controlling a browser's URL input window. 

^ Figure 4 shov^ a screen shot of a browser with Chinese Natural l-anguage 

^ Access and Navigation Ser\nce. 

iij 15 Figures 5A, 5B, and 5C illustrate the three basic infrastructures of the 
f:J intelligent information processing in a wide area networic in accordance with 

3 the present invention. 

Figure 6 shows a process for Chinese natural language processing. 
Figure 7 shows another process for Chinese natural language processing. 
20 Figure 8 shows the method of Chinese characters and/or English words 
processing of the present invention. 

Figure 9 shows the method of full Chinese phonetic spelling words processing 
of the present invention. 

Figure 10 shows the method of abbreviated Chinese phonetic spelling words 
25 processing of tiie present invention. 

Figure 11 illustrates the process of determining types of words of a query entry 
before the information processing in accordance with the present invention. 
Figures 12A and 12B illustrate, respectively, the search method of homonym 
words of full phonetic spelling and the search method of full phonetic spelling 
30 words with dialect misspellings in accordance with the present invention. 

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS 

As will be appreciated by anyone sidlled in the art, the present invention may 
be embodied as a method, data processing system or program products. 
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Software written ac(»ixling to the present invention is to be stored in some 
form of computer readable medium, such as memory, or CD ROM, or 
transmitted over a netwoiic, and executed by a processor. Nonelheless, Ihe 
prindples of the present invention may be descrit>ed in a method of intelligent 
information processing in a network or a system of intelligent infonnation 
processing in a network as stated in details hereinafter. 

Figure 1 shows a system of the present invention. A user machine/computer 
101 is connected to web servers 102 and Intemet resource locater servers 
such as ttie servers 103 and 104 at httD: //www.3721 .com via Intemet 
connections 108, 109. The user computer 101 may be any kinds of 
computers running Mterosofl® Windows operating s^^tem, including PCs, 
Macintosh computers, an Intemet appliance such as a WebTV and a wireless 
Intemet brawling de^ce. The user computer 101 may be connected to the 
Intemet via a dial in modem, a DSL line, a cable modem, a dedicated line such 
as T1 or T3, or an optical fiber connection. A person skilled In the art would 
appreciate that thfe invention is not limited to specific type of user computer or 
connection between the user computer and the Intemet. The Internet resource 
locater servers 103 and 104 include the browser pattern database 105, URL 
pattern 106, and other patterns 107. 

Figure 2 shows a user computer 203 connected, via Intemet connection 202, 
to an internet resource locator server 201, such as 3721 server or other 
servers containing the sen/er software of the present invention. An image of 
the screen of a browser is executing in the user's computer 203. Small 
user-end computer software of the mvention is also executing in tiie user's 
computer 203 (see the small picture on the bottom of the screen). The small 
user-end computer software intercepts flie text message (msg) input fix)m the 
address box of tiie browser. The message is either transmitted to the intemet 
resource locator server 201 for processing or processed locally by the small 
user-end softvrare. 

Figure 3 shows tiie process perfomied by tiie user end software of the present 
invention. The user end software inject into all running processes use win 32 

6 



wo 02/01312 



PCT/CNOl/01062 



hook technology. A hook is a point in the Microsoft® Windows 
message-handling mechanisms where an application can install a subroutine 
or a separate module to monitor the message traffic in the system and process 
certain types of messages. A ivxk procedure can be gtobal, monitoring 
messages for all threads In the system, or it can be thread specific, monitoring 
messages for an individual thread. Some hooks may be set with system scope 
only (e.g. WH_SYSMSGFILTER), but most hooks have either system or thread 
scope. Teachings on the user of Win32 hooks may be found, for example, at 
Microsoft® MSND web site (http : //www.microsoft.com) . 

All mnning processes are checked to detemnine whether it is a target. If it is a 
target, infonnation about the process is used to find the edit control of the 
browser where users input URL The infonnation may be user to search a 
browser pattern library to determine which version of the browser is executing 
in the user's computer. The database may be automatically updated. 

Once the edit control is found, a subclass is created. The message of the Edit 
Window may be combo box, drop down selection or keyboanJ input. If it is a 
keyboard Input, it is checked to see whether it is a URL address. It is also 
search against a database with regular URL pattern library. If it is combo box or 
drop-down selection, it is processed as shown in Figure 3. 

Figure 4 shows an image of a browser (in Chinese version) interacting with the 
user end software of the present invention. A user enters the word "computer" 
in Chinese in the address box of the browser, a list of addresses in Chinese 
related to this word is generated. 

Nonetheless, nowadays, the web search of desired websites is not only earned 
out ttirough English words, using either URL or keywords, but also earned out 
in other native languages, such as Chinese. This will require some pertinent 
information pro(^sing method or system that may efiectively and accurately 
cany out such web search using the native languages. 

It can be appreciated that a search is normally carried out through a database 
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that conteins particularly designed search tables to facilitate various search 
ta^s. There is no exemption for web search in, for instance, Chinese 
languages. For purpose of carrying out the search of the present invention, at 
least the Internet resource locator server should c(»itain at least a Chinese 
character search Index table, a full phonetic spelling (Pinyin) search index table, 
and phonetic spelling alphabet abbreviation (Pinyin header) of Chinese words 
search table. 

Normally, when a query of keywords is entered, the entered phrases of the 
keywords are broken down into several meaningful words that will be matched 
against the search teble of predetemiined structure. Then, the results of the 
words will be considered together to detennine the final result or results of the 
query. However, for some native languages, such as Chinese, the entered 
query may be in Chinese characters. Each character may or may not have 
any exact meaning, and a combination of one diaracter with other characters 
may create various meaningful Chinese words. Hence, a simple breakdown of 
a query in Chinese may not assure an accurate result of the query. Thus, the 
present Invention separates the entered phrase or characters of the query Into 
meaningful Chinese words of all possible combinations of the entered Chinese 
characters. 

For instance, the first character Is not just simply combined with the following 
se(x>nd and/or third characters to get the meaningful word, and then the 
subsequent characters, after the previous combination, will fonn any other 
meaningful words. In the present invention, the first chanac^r will be 
combined with anyone of the entered characters to form all possible 
meaningful words for. the query. Therefore, the obtained query results may 
assure tiie accuracy of the query when all results o^e from all of these 
possible combined meaningful words. 

The possible query , inputs in Chinese based websites are Chinese character 
inpute, URL inputs, and Pinyin inputs that further include full phonetic spelling 
inputs, first letter abbreviations of phonetic spelling, homonym of phonetic 
spelling inpute, and local accent phonetic spelling inputs. Before going into the 
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details of the method and system of the present invention for each of tiie 
aforesaid inputs, a discussion of the current techniques of Chinese inputting 
may assist the better understanding of the present invention. 

The major encoding systems for Chinese are: Big 5, and Guoblao (i.e., national 
standard). Generally, BigS is preferred for processing traditional Chinese 
characters or Guobiao for the simplified characters. Under the Big 5 encoding 
system popular in Hong Kong and Taiwan, the coding for (tian, "sky") is 
1101000110100100. The Guobiao encoding for "tian" is 1110110011001100. 
Note that tiie Big 5 code or Guobiao code for "lian" above begins witJi a 1, 
while the ASCII code for letter "A" begins vAth a 0. This pattern holds generally 
true, that is, all Chinese codes begin with 1 and all ASCII codes begin with 0. 
In this manner, in a file tiiat contains both English and Chinese text, the system 
can detect whether a given byte is intended as English or Chinese. 

Entering (inputting) and processing Chinese language text on a computer is a 
very difficult problem. The shear numbers of Chinese characters illustrate this 
difficulty. In the square-^character (Hanzi) writing system of Chinese, there are 
3000 to 6000 commonly used Chinese characters (Hanzi). Including the 
relatively rare ones, there are more than ten thousands Chinese characters. 
Adding to this difficulty, there are problems in the Chinese language with text 
standardization, multiple homonyms, and ill-defined word boundaries that 
impede effective text processing of Hanzi with computers. In spite of Intensive 
studies for several decades and the existence of hundreds of different methods, 
computer input and processing of Chinese is a major stumbling block 
preventing the use computers In China, particularly for text processing. 

At present, computer systems available for inputting and processing Chinese 
language text may be divided into three categories. The first category is based 
on a decomposition of the Chinese characters Into elementary graphical 
components. The decomposition of Chinese characters of each method is not 
unique. Therefore, it is rather difficult for people to learn those methods. 

The second and third categories are based on pronunciation, such as full 
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phonetic spelling method. Tiiese metiiods encounter a "homonym problem" in 
Chinese language processing. The second category is phonetic input, (e.g. 
"Pinyin" for mainland China and "phonetic symbols" or BPMF for Taiwan) which 
is the most commonly used method for everyone except professional typists. 
The Chinese character writing system of Chinese language Is a conceptual 
and practical banrier to this method. 

Although there are only about 1300 different phonetic syllables, in contrast to 
tens of thousands of characters, one phonetic syllable may correspond to 
many different Chinese characters. For example, the pronunciation of "yi" in 
Mandarin can correspond to over 100 Chinese characters. This creates 
ambiguities when translating the phonetic syllables, as the inputs, into the 
conesponding Chinese characters. 

To address this "homonym problem," most of the phonetic input systems use a 
multiple-choice method. See for example, German patent 3,142,138, issued 
May 5, 1983 to J. Helnzl et al.; U.S. Patent No. 5.047,932, issued September 
10, 1991 to K C. Hsieh; and Chinese Patent Publication No. 1064957, issued 
March 8, 1991 to Tan Shanguang. After a phonetic syllable is keyed in. the 
computer displays all possible characters vwtti the same pronunciation. In 
some cases, there is not enough space on the screen to display all possible 
characters with the same pronunciation. This will require scrolling up and down. 
Therefore, these phonetic methods, based on individual syllables, are very 
slow. 

An improvement to the multiple-choice methods based on deriving prob^ility 
of the adjacent Chinese characters is disclosed in, for example, British Patent 
2,248,328, issued on April 1, 1992 to R. W. Sproat The probability approach 
can further be combined with grammatical constraints. See for exsmpte, K. T. 
Lua et al.. Computer Processing of Chinese and Orlentel Languages, Vol. 6, 
Num 1, page 85, June 1992. However, the conversion accuracy (phonetic to 
characters) of tiiese methods is typically limited to around 80%. 

The third category combines a phonetic-character input method with the 



wo 02/01312 



PCT/CNOl/01062 



addition of non-phonetic letters. Non-phonetic letters are added to the phonetic 
letters to artificially discriminate diaracters with ttie same pronundatlon. 
Examples include phonetic spelling with radical marks (British Patent No. 
2.158,776, Issued Nov. 20, 1985 to C. C. Chen) and phonetic spelling with 
5 number of strokes (Chinese Patent Publication No. 1066518, issued November 
25, 1992 to G. Xre). These methods require memorizing artificial rules or 
counting number of strokes that slows down the speed of Input substentially. 

Other methods for inputting Chinese characters are described In, for example, 
U.S. Patent No. 6,073,146. The '146 patent teaches a system employing a 
keyboard with diacritic keys (and conresponding ASCII coding) that pemift the 
user to annotate each entered phonetic text syllable with a diacritic that 
indicates the tone of the syllable. A process executing on the system 
determines that a syllable has been entered when a diacrltte (or delimiter) key 
is struck. All entered phonetic syllable is then compared to a list of acceptable 
phonetic syllables and abbreviations. If the entered syllable is on the list, the 
correctly spelled and accented syllable is stored In memory and displayed on a 
phonetic portion of a graphical display. The process continues for succeeding 
syllables until a delimiter is entered. Upon encountering a delimiter, the word 
string (defined as the string of characters between two delimiters) is analyzed 
using morphological and syntactical processes and/or a statistical language 
model to unambiguously detemiine the proper Chinese characters that 
represent the word(s) in the word string. The unique Chinese translation is 
stored in memory and displayed on a Chinese character portion of the 
graphical Interface. 

In accordance with the present invention, the query index data stmcture for 
internet keyword search are Illustrated in Figure 5A, 5B, and 5C. These are the 
approximate infrastnjcture of three search index tables of the present invention. 
In order to realize the high speed intelligent search of Intemet keyword, it is 
very Important to establish a high efficient data infrastructure that is suftable for 
searching massive data. The three data structures of the present invention are 
(1) the Index table for Intelligent search for identifying words or phrases of 
nomnal Chinese characters and English word; (2) the index table for intelligent 
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search based on full phonetic spelling of Chinese characters; (3) the index 
table for intelligent search based on phonetic spelling alphabetic abbreviation. 

With respect to Figure 5A, the index table is a Chinese or English Word List 
5 that contains all Chinese or English words, for instance, "China", "software", 
"computer^, "ibm" etc. In the Chinese or English Word List, each word is 
connected to an internet Keyword Point List In such a table, each point 
Indicates a pointer pointing toward an actual storage space of an Internet 
Keyword, in which such a word is contained. Therefore, it may search for all 
10 Internet keywords that contain the word, either in Chinese or English, from the 
Internet Keyword Entry Point List linlced to eadi of said words. 

With respect to Figure 5B, the data structure is similar to the one in Figure 5A. 
Only the left side Chinese words are in the form of PInyIn, i.e., phonetic 
15 spellings. For mstance, the above given words In Chinese are now ":diongguo", 
"ruanjian", "diannao", etc. The linked Internet KeywonJ Entry Point List Is a list 
_ of ttie Internet Keywords that contain sudi a word in Chinese phonetic spelling 
form. 

20 Rgure 5C also has similar data structure as the one in Figure 5A. The 
drflerence is that on the left side of the word table each of such words is in the 
fomi of phonetic spelling alphabetic abbreviations, such as, "zg", "rj". "dn" etc. 
Thus, the related Internet Keyword Entry Point List includes words 
corresponding to these phonetic spelling alphabetic abbreviations for the query. 

25 From these three figures. It can be seen that the three basic intelligent search 
methods have similar data structure, but have the words stored in different 
fomns of Chinese or English words, full phonetic spelling (Pinyin), or phonetic 
spelling alphabetic abbreviations (headers of phonetic spelling words). 
Therefore, it can be understood ttiat the internal computing method for these 

30 three kinds of search is the same. The key is how tiiese words being grouped 
or selected firom tiie query to fomi meaningful search words. As discussed 
above, the query is broken up into several oombinatfons of characters 
Indicative of all possible meaningful words as thus combined to assure every 
possible search words pointing to tiie Internet Keywords on tiie list, and how 
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the query is identified as Chinese character entry or English vrard entry, full 
phonetic spelling word entry or phonetic spelling alphabetic abbreviation entry. 
The corresponding methods according to tiie present invention are discussed 
hereinafter. 

5 

Despite of the development of easier methods, inputting Chinese characters is 
still an extremely difficult teisk. Particularly if the internet device is a handheld 
device such as a Personal Data Assistant or a cell phone with wireless internet 
connection. In one aspect of the invention, methods for simplifying the entry of 
O 10 Chinese characters are provided. The methods are particularly useful for 
gi entering web addresses or natural language keywords or names of a web site 

1* (page). Figure 6 shows one embodiment of the invention. In this method, the 

H user types in the first letter of the Pinyin spelling of a Chinese word indicated at 

5 501 . The first letter Is used to query a database and a list of possible URLs are 

Ij 15 listed indteated at 602. The list may be based upon statistical information such 
pi as fiequency of requests. In other words, the most popular URLs are listed firet 

fl indicated at 503. 

ni 

In another embodiment of the invention as seen In Figure 7, the Pinyin spelling 
20 of a Chinese word is inputted at 601. The spelling is checked to determine 
whether it contains frequent misspellings at 602. Misspelling frequently occurs 
because of accent. In the southem part of China, because of southern accent, 
many southerners make phonetic spelling mistakes of Chinese characters. If 
the phonetic misspelling occurs due to the southem accent, the system of the 
25 present invention will conrect them automatically at 605. If the query does not 
have any phonetic misspelling or the misspelling has been con^, It will then 
check a database of related URLs at 603. The output will be displayed at 604. 

The small user-end software that is supported through a back-end intelligent 
30 search engine and database exemplifies one embodiment of the Invention. The 
software may be downloaded from http:/ Awww.3721.com . Users do not need to 
know or type the long and complicated alphabetical URLs, instead they simply 
type Chinese characters, in the web address box, for familiar brands, product 
names, and they will be brought to their desired destination sites or related 

1 3 



wo 02/01312 



PCT/CNOl/01062 



webpages. For example, instead of typing http://Www.legend.com.cn, users 
can simply type "Legend Computers" in Chinese and will get to the site they 
vnsh to visit 

Tuming now to the key features of the present invention, Figure 8 shows the 
basic flow chart of the Chinese character and/or English words search of the 
present Invention. After the query string A in the form of Chinese characters 
and/or English words is entered at 801, the system will parse the query string A 
against the Chinese English Words List (CEWL), and split the query string A to 
one or more Chinese words: W={Wi,W2,...,Wn} at 802. For each word Wx in W, 
at 803 the system parses the word Wx In the CEWL to find the attached 
Intemet Keyword Entry Point List (IKEPLx), and then each node In the IKEPLx 
will point to an Internet Keyword (IK) containing the woixi Wx. 

The system will combine all IKEPLi, IKEPL2 ... IKEPLn and get the result R at 
804.thatls,R = [KEPLi U IKEPU U ...IKEPLn. Since each IKEPLx points 
to an IK containing a word Wx. an IK in R will then contain at least one word 
Wx In W. At 805. while doing the combination, the system will calculate the 
weight of each IK in R according to specified rules, such as the followlngs: 

(1) Weight of count: the number of words within W that the IK contains. 

(2) Weight of length: the total length of words within W that the IK contains... 
Finally, the system will calculate the comprehensive weight of each IK based 
on the above rules. After the calculation, at 806 the system will sort the result 
list R according to weight of IK, such that the most approximate result appears 
at head of the list, and the system will limit the number of result In R. Then, the 
final IK list R appears at 807. 

Ukewise, as seen in Figure 9, the entered query string A is in the Ibmi of full 
phonetic spelling at 901. After the entry of the string A, the system parses the 
string A against Full Chinese Pinyin Words List (FCPWL) and splits It into one 
or more Chinese phonetic spelling words: W={Wi, W2. ... Wn} at 902. For each 
word Wx in W, at 903 the system will parse it in the FCPWL to find the 
attached Intemet Keyword Entry Point List IKEPLx, and then each node in 
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IKEPUc will point to an Internet Keyword (ItQ vA\ose phonetic spelling 

containing Wbx. Then, at 904, the system combines IKEPLi, IKEPL2 

IKEPLn to obtain a result R = IKEPL1 U IKEPL2 U ... IKEPU- Thus, each 
IK in R has a phonetic spelling containing at least one word Wx in W. The 
following steps 906-907 are very much the same as those of 805-807, that is, 
calculating the weight of each IK in R according to specified rules; sorting the 
result list R according to weight of IK, so as the most approximate result 
appears at head of the list, and limit the number of result in R; and the finally 
obtaining a result IK list R. 

For the same token, as seen In Figure 10, a user will input a query string A in 
an abbreviated Chinese phonetic spelling staing A at 11. The system parses 
the string A against ACPWL, and splits the string A into one or more 
abbreviated Chinese phonetic spelling words: W=KWi, W2, ...,Wh} at 12. Then 
at 13, for each word Wx In W, the system parses the word in ACPWL to find 
the attached Internet Keyword Entry Point List IKEPLx, and then each node in 
IKEPLx will point to an Internet Keyword (IK) whose abbreviate phonetic 
spelling containing the word Wx. Then at 14, the system combines IKEPLi, 
IKEPL2,.... IKEPLn to get a result R = IKEPLi U IKEPL2 ... IKEPLNand then 
each IK in R has an abbreviated phonetic spelling containing at least one word 
Wx in W. The following steps 15-17 are substantially the same as those in 
Figures 8 and 9, that is, calculating the weight of each IK in R according to 
specified rules; sorting the result list R according to weight of IK, such that the 
most approximate result appears at head of the list, and limiting the number of 
result In R. and obtaining the final result IK list R. 

On the basis of the above three kinds of Intelligent search modes, i.e., for 
Chinese characters and/or English words, full Chinese phonetic spelling words, 
and abbreviated Chinese phonetic spelling words, the method and system of 
intelligent infomriation processing in a wide area network, according to the 
present invention, will detemilne whether the query entry is a string of Chinese 
characters and/or English words, full Chinese phonetic spelling words, and 
abbreviated Chinese phonetic spelling words as shown in Figure 11. That is, 
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after the entry of a string A at 110, the system will detennine whether the 
entered query string is in the form of full Chinese phonetic spelling words at 
111. If it is, the system will carry out the calculation in accordance with the 
Int^ligent search method of full phonetic spelling words search as shown in 
Rgure 9. 

If it is not a string of full Chinese phonetic spelling words, the system will 
detennine whether the query string is in the fomn of abbreviated Chinese 
phonetic spelling words at 112. If it is, the system will cany out the calculation 
of abbreviated Chinese phonetic spelling words as shown in Figure 10. If it is 
not, the system thus detemiines that the query string is in the form of Chinese 
characters and/or English words, and will cany out the calculation of the same 
as shown in Figure 8. However, In one situation, the system will detemilne 
whether the calculation result of either the full Chinese phoneUc spelling word 
search or the abbreviated Chinese phonetic spelling words search is empty at 
113. If it is empty, the system will do the calculation of Chinese charactere 
and/or English words search as seen in Figure 8 again. If the calculation of the 
search mode of Figure 9 or RgurelO is not empty, the calculation result thereof 
will then be determined as the final result. 

F^ure 12A illustrates a search method of homonym words of full phonetic 
spelling in accordance with the present invention. After the query string is 
entered at 121, the system will analyze all possibility of the homonym words, 
and generate all of these words as searchable words of full Chinese phonetic 
spelling at 122. For each of the homonym words of full Chinese phonetic 
spelling, the system will cany out, at 123, the calculation of full Chinese 
phonetic spelling words search as discussed with respect to Figure 9. After 
obtaining ail search results Rn, the system will analyze the results Rn and 
obtain the final and most possible result or limited number of results at 124. 

Figure 12B illustrates a search method of full phonetic spelling words with 
dialed misspellings in accordance with the present invention. Furthering the 
method and system of Figure 7, after the entry of a query string of phonetic 
spelling words at 125, the system of the present invention will analyze, at 126, 
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the entered words against a table listing all possible misspelled consonants or 
vows for corresponding Chinese characters by southerners, such as "huang" 
and "wang", "shr and "si", "lu" and "10". etc. Anyway the possible misspelling 
words are enumerated on the list. Thus, tiie entered query string is separated 
into several words of phonetic spelling to cover all possible spelling words, and 
then they are calculated through the method of full phonetic spelling search to 
obtain all possible IK of the result at 127. Then, the search results are 
analyzed to obtain the final and most possible result or results at 128. 

It can be understood that the above description is intended to be Illustrative 
and not restrictive. Many variations of the invention will be apparent to those 
skilled in the art upon reviewing the above description. The scope of the 
invention should, tiierefoie, be detennined not only with reference to the above 
description, but also with variations and equivalent. While the invention will be 
described in conjundion with the prefened embodmients, it vwil be understood 
that they are not intended to limit the invention to these embodiments. On the 
contrary, the invention is intended to cover aKematives, modifications and 
equivalents, which may be included wr&iln tiie spirit and scope of the invention. 



